Pesquisa | Portal Regional da BVS

Sentiment analysis in multilingual context: Comparative analysis of machine learning and hybrid deep learning models.

Das, Rajesh Kumar; Islam, Mirajul; Hasan, Md Mahmudul; Razia, Sultana; Hassan, Mocksidul; Khushbu, Sharun Akter.

Heliyon ; 9(9): e20281, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37809397

RESUMO

This research paper investigates the efficacy of various machine learning models, including deep learning and hybrid models, for text classification in the English and Bangla languages. The study focuses on sentiment analysis of comments from a popular Bengali e-commerce site, "DARAZ," which comprises both Bangla and translated English reviews. The primary objective of this study is to conduct a comparative analysis of various models, evaluating their efficacy in the domain of sentiment analysis. The research methodology includes implementing seven machine learning models and deep learning models, such as Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), Convolutional 1D (Conv1D), and a combined Conv1D-LSTM. Preprocessing techniques are applied to a modified text set to enhance model accuracy. The major conclusion of the study is that Support Vector Machine (SVM) models exhibit superior performance compared to other models, achieving an accuracy of 82.56% for English text sentiment analysis and 86.43% for Bangla text sentiment analysis using the porter stemming algorithm. Additionally, the Bi-LSTM Based Model demonstrates the best performance among the deep learning models, achieving an accuracy of 78.10% for English text and 83.72% for Bangla text using porter stemming. This study signifies significant progress in natural language processing research, particularly for Bangla, by enhancing improved text classification models and methodologies. The results of this research make a significant contribution to the field of sentiment analysis and offer valuable insights for future research and practical applications.

BTSD: A curated transformation of sentence dataset for text classification in Bangla language.

Das, Rajesh Kumar; Islam, Mirajul; Khushbu, Sharun Akter.

Data Brief ; 50: 109445, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37577411

RESUMO

The Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentences into Simple, Complex, and Compound classes. It serves as a benchmark for evaluating NLP models on Bangla sentence classification, promoting linguistic diversity and inclusive language models. Collected from publicly accessible Facebook pages, the dataset ensures balanced representation across the categories. Preprocessing steps, including anonymization and duplicate removal, were applied. Three native Bangla speakers independently assessed the Transformation of Sentence labels, enhancing the dataset's reliability. The dataset empowers researchers, practitioners, and developers to build accurate and robust NLP models tailored to the Bangla language. It offers insights into Bangla syntax and structure, benefiting linguistic research. The dataset can be used to train models, uncover patterns in Bangla language usage, and develop effective NLP applications across domains.

COVID-19 in Bangladesh: A Deeper Outlook into The Forecast with Prediction of Upcoming Per Day Cases Using Time Series.

Mohammad Masum, Abu Kaisar; Khushbu, Sharun Akter; Keya, Mumenunnessa; Abujar, Sheikh; Hossain, Syed Akhter.

Procedia Comput Sci ; 178: 291-300, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33520018

RESUMO

A global pandemic on March 11th of 2020, which was initially renowned by the World Health Organization (WHO) revealed the coronavirus (the COVID-19 epidemic). Coronavirus was flown in -December 2019 in Wuhan, Hubei region in China. Currently, the situation is enlarged by the infection in more than 200 countries all over the world. In this situation it was rising into huge forms in Bangladesh too. Modulated with a public dataset delivered by the IEDCR health authority, we have produced a sustainable prognostic method of COVID-19 outbreak in Bangladesh using a deep learning model. Throughout the research, we forecasted up to 30 days in which per day actual prediction was confirmed, death and recoveries number of people. Furthermore, we illustrated that long short-term memory (LSTM) demands the actual output trends among time series data analysis with a controversial study that exceeds random forest (RF) regression and support vector regression (SVR), which both are machine learning (ML) models. The current COVID-19 outbreak in Bangladesh has been considered in this paper. Here, a well-known recurrent neural network (RNN) model in order to referred by the LSTM network that has forecasted COVID-19 cases on per day infected scenario of Bangladesh from May 15th of 2020 till June 15th of 2020. Added with a comparative study that drives into the LSTM, SVR, RF regression which is processed by the RMSE transmission rate. In all respects, in Bangladesh the gravity of COVID-19 has become excessive nowadays so that depending on this situation public health sectors and common people need to be aware of this situation and also be able to get knowledge of how long self-lockdown will be maintained. So far, to the best of our knowledge LSTM based time series analysis forecasting infectious diseases is a well-done formula.

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA